Increasing the sales and profit margins is one of the most crucial priority of business owners. Thus, an owner of a local cafe asked me do some analysis on three different data to increase the profits and to predict future profits.
How could a cafe increases it’s profits?
To answer this question or to discover new information, or confirm an idea they already know, I plan to:
The visit data is from a device the owner of the Cafe put above the main door to count the visitors and have some extra variables.It has 7 variables and 8089 observations.
| Variable | Description |
|---|---|
| day | The date |
| Time | The hour of the date |
| ValueIn | The number of visitors |
| ValueOut | The number of visitors leaving the cafe |
| Turn In Rate(%) | The rate of visitors turn in |
| OutsideTraffic | The numbers of people outside the cafe |
The items data from the sales website and it has 5 variables and 96 observations.
| Variable | Description |
|---|---|
| item | The name of the item |
| count | How many pieces have been sold |
| price | The overall price |
| cost | The cost of the items |
| profits | The amount of money earned |
The sales data is from the sales website that he is using directly and has the 6 variables and 338 observations.
| Variable | Description |
|---|---|
| Total sales | The total number of sales |
| Items cost | The cost of the sold items |
| Taxes | Additional fee |
| Offers | Discount or some offers |
| Profits | The amount of money earned |
In visit data, the variables are positively skewed due to the large number of zeros and ones since it is hourly data.
It is not enough data since the cafe is open for less that a year
## load packages
library(tidyverse)## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.5 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.4 ✓ stringr 1.4.0
## ✓ readr 2.0.1 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(kknn)
library(ggrepel)
library(corrplot)## corrplot 0.90 loaded
library(dplyr) # for data manipulation
library(EnvStats)##
## Attaching package: 'EnvStats'
## The following objects are masked from 'package:stats':
##
## predict, predict.lm
## The following object is masked from 'package:base':
##
## print.default
library(RColorBrewer)
library(vip) # for variable importance##
## Attaching package: 'vip'
## The following object is masked from 'package:utils':
##
## vi
#Time Series
library(tseries)## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(forecast)
library(xts)## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
##
## first, last
library(fpp3)## ── Attaching packages ──────────────────────────────────────────── fpp3 0.4.0 ──
## ✓ lubridate 1.7.10 ✓ feasts 0.2.2
## ✓ tsibble 1.1.0 ✓ fable 0.3.1
## ✓ tsibbledata 0.3.0
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## x lubridate::date() masks base::date()
## x dplyr::filter() masks stats::filter()
## x xts::first() masks dplyr::first()
## x fabletools::forecast() masks forecast::forecast()
## x tsibble::index() masks zoo::index()
## x tsibble::intersect() masks base::intersect()
## x tsibble::interval() masks lubridate::interval()
## x dplyr::lag() masks stats::lag()
## x xts::last() masks dplyr::last()
## x tsibble::setdiff() masks base::setdiff()
## x tsibble::union() masks base::union()
library(tsibble)
library(prophet)## Loading required package: Rcpp
## Loading required package: rlang
##
## Attaching package: 'rlang'
## The following objects are masked from 'package:purrr':
##
## %@%, as_function, flatten, flatten_chr, flatten_dbl, flatten_int,
## flatten_lgl, flatten_raw, invoke, list_along, modify, prepend,
## splice
# Ploting
library(reshape2)##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
library(plotly)##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(ggplot2)
library(ggpubr) #To arrange plots on one page##
## Attaching package: 'ggpubr'
## The following object is masked from 'package:forecast':
##
## gghistogram
# Modeling process
library(tidymodels)## Registered S3 method overwritten by 'tune':
## method from
## required_pkgs.model_spec parsnip
## ── Attaching packages ────────────────────────────────────── tidymodels 0.1.4 ──
## ✓ broom 0.7.9 ✓ rsample 0.1.0
## ✓ dials 0.0.10 ✓ tune 0.1.6
## ✓ infer 1.0.0 ✓ workflows 0.2.3
## ✓ modeldata 0.1.1 ✓ workflowsets 0.1.0
## ✓ parsnip 0.1.7 ✓ yardstick 0.0.8
## ✓ recipes 0.1.17
## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
## x rlang::%@%() masks purrr::%@%()
## x yardstick::accuracy() masks fabletools::accuracy(), forecast::accuracy()
## x rlang::as_function() masks purrr::as_function()
## x scales::discard() masks purrr::discard()
## x plotly::filter() masks dplyr::filter(), stats::filter()
## x xts::first() masks dplyr::first()
## x recipes::fixed() masks stringr::fixed()
## x rlang::flatten() masks purrr::flatten()
## x rlang::flatten_chr() masks purrr::flatten_chr()
## x rlang::flatten_dbl() masks purrr::flatten_dbl()
## x rlang::flatten_int() masks purrr::flatten_int()
## x rlang::flatten_lgl() masks purrr::flatten_lgl()
## x rlang::flatten_raw() masks purrr::flatten_raw()
## x infer::generate() masks fabletools::generate()
## x rlang::invoke() masks purrr::invoke()
## x dplyr::lag() masks stats::lag()
## x xts::last() masks dplyr::last()
## x rlang::list_along() masks purrr::list_along()
## x rlang::modify() masks purrr::modify()
## x parsnip::null_model() masks fabletools::null_model()
## x rsample::populate() masks Rcpp::populate()
## x rlang::prepend() masks purrr::prepend()
## x yardstick::spec() masks readr::spec()
## x rlang::splice() masks purrr::splice()
## x recipes::step() masks stats::step()
## • Learn how to get started at https://www.tidymodels.org/start/
#Downloading the data
sales.data <- read.csv("~/Documents/GitHub/DS_Capstone_Cafe_Analysis/sales-data.csv")# Ploting the response
Prof.plot <- ggplot(data=sales.data, aes(x=Profits)) +
geom_histogram(fill="black",bins = 30) +
xlab("The amount of money earned") +
NULL
## The rest of the variables
cost.plot <- ggplot(data=sales.data, aes(x=Items.cost)) +
geom_histogram(fill="black",bins = 30) +
xlab("The cost of the sold items") +
NULL
tax.plot <- ggplot(data=sales.data, aes(x=Taxes)) +
geom_histogram(fill="black",bins = 30) +
xlab("Additional fee") +
NULL
offer.plot <- ggplot(data=sales.data, aes(x=Offers)) +
geom_histogram(fill="black",bins = 30) +
xlab("Discount or some offers") +
NULL
tot.sale.plot <- ggplot(data=sales.data, aes(x= Total.sales)) +
geom_histogram(fill="black",bins = 30) +
xlab("The total number of sales") +
NULL
ggarrange(Prof.plot, cost.plot, tax.plot, offer.plot, tot.sale.plot + rremove("x.text"),
ncol = 3, nrow = 2)###########################################
# This file loads monthly sales per day
# We will use data to forecast
###########################################
# Taking out the observations from 2020-11-26 until 2020-11-30 since there wont be monthly affect for this month
sales.data <- slice(sales.data,-c(303:333))
# Choosing only the data and Profits
ts.sales <- select(sales.data,c(1,6))
# Add some features to our dataset
ts.sales$Date = ymd(ts.sales$Date)
ts.sales$year = year(ts.sales$Date)
ts.sales$month = month(ts.sales$Date)
ts.sales$day = day(ts.sales$Date)
#Declare this as time series data
# Daily data
## Create a daily Date object - helps my work on dates
inds <- seq(as.Date("2021-01-01"), as.Date("2021-10-29"), by = "day")
## Create a time series object
set.seed(25)
myseries1 <- ts(ts.sales[,2],
start = c(2021, as.numeric(format(inds[1], "%j"))),
frequency = 302)
###########################################
# Preliminary Analysis
###########################################
# Time Plot
autoplot(myseries1) + ggtitle("Time Plot: Daily Gross Profits") +
ylab("Thousands of Riyals")# Data has some trend. Investigate transformations.
#Take the first difference of the data to remove the trend.
DY <- diff(myseries1)
# Time plot of the differenced data
autoplot(DY) +
ggtitle("Time Plot: Change in Daily Gross Profits") +
ylab("Thousands of Riyals")# Series appears stationary, use to investigate seasonality.
#not working
# ggseasonplot(DY) +
# ggtitle("Seasonal Plot: Change in Daily Gross Profits") +
# ylab("Thousands of Riyals")
# subseries plot
# not working
#ggsubseriesplot(DY)
############################################
# ARMA MODEL
############################################
#This will plot the time series
ts.plot(myseries1, xlab="Year", ylab="Daily Gross Profits", main="Daily Gross Profits")
# This will fit in a line
abline(reg=lm(myseries1~time(myseries1)))#Auto correlation
acf(myseries1) #it describes how well the present value of the series is related with its past values############################################
#Fitting the AR Model to the time series
############################################
AR <- arima(myseries1, order = c(1,0,0))
print(AR)##
## Call:
## arima(x = myseries1, order = c(1, 0, 0))
##
## Coefficients:
## ar1 intercept
## 0.5933 1030.6234
## s.e. 0.0464 64.3063
##
## sigma^2 estimated as 211815: log likelihood = -2318.27, aic = 4642.55
#plotting the series along with the fitted values
ts.plot(myseries1)
AR_fit <- myseries1 - residuals(AR)
points(AR_fit, type = "l", col = 2, lty = 2)#Forcasting using AR model
#Using predict() to make a 1-step forecast
predict_AR <- predict(AR)
#Obtaining the 1-step forecast using $pred[1]
predict_AR$pred[1]## [1] 1296.144
#ALternatively Using predict to make 1-step through 10-step forecasts
predict(AR, n.ahead = 10)## $pred
## Time Series:
## Start = c(2022, 6)
## End = c(2022, 15)
## Frequency = 302
## [1] 1296.144 1188.166 1124.099 1086.085 1063.531 1050.149 1042.208 1037.497
## [9] 1034.702 1033.043
##
## $se
## Time Series:
## Start = c(2022, 6)
## End = c(2022, 15)
## Frequency = 302
## [1] 460.2340 535.1484 559.1380 567.3420 570.2021 571.2055 571.5584 571.6826
## [9] 571.7263 571.7417
#plotting the series plus the forecast and 95% prediction intervals
ts.plot(myseries1)
AR_forecast <- predict(AR, n.ahead = 10)$pred
AR_forecast_se <- predict(AR, n.ahead = 10)$se
points(AR_forecast, type = "l", col = 2)
points(AR_forecast - 2*AR_forecast_se, type = "l", col = 2, lty = 2)
points(AR_forecast + 2*AR_forecast_se, type = "l", col = 2, lty = 2)############################################
#Fit the MA model
############################################
#Fitting the MA model to AirPassengers
MA <- arima(myseries1, order = c(0,0,1))
print(MA)##
## Call:
## arima(x = myseries1, order = c(0, 0, 1))
##
## Coefficients:
## ma1 intercept
## 0.5851 1024.9978
## s.e. 0.0393 42.1326
##
## sigma^2 estimated as 217433: log likelihood = -2322.28, aic = 4650.57
#plotting the series along with the MA fitted values
ts.plot(myseries1)
MA_fit <- myseries1 - resid(MA)
points(MA_fit, type = "l", col = 2, lty = 2)#Forcasting using MA model
#Making a 1-step forecast based on MA
predict_MA <- predict(MA)
#Obtaining the 1-step forecast using $pred[1]
predict_MA$pred[1]## [1] 1286.585
#Alternately Making a 1-step through 10-step forecast based on MA
predict(MA,n.ahead=10)## $pred
## Time Series:
## Start = c(2022, 6)
## End = c(2022, 15)
## Frequency = 302
## [1] 1286.585 1024.998 1024.998 1024.998 1024.998 1024.998 1024.998 1024.998
## [9] 1024.998 1024.998
##
## $se
## Time Series:
## Start = c(2022, 6)
## End = c(2022, 15)
## Frequency = 302
## [1] 466.2969 540.2385 540.2385 540.2385 540.2385 540.2385 540.2385 540.2385
## [9] 540.2385 540.2385
#Plotting the AIrPAssenger series plus the forecast and 95% prediction intervals
ts.plot(myseries1)
MA_forecasts <- predict(MA, n.ahead = 10)$pred
MA_forecast_se <- predict(MA, n.ahead = 10)$se
points(MA_forecasts, type = "l", col = 2)
points(MA_forecasts - 2*MA_forecast_se, type = "l", col = 2, lty = 2)
points(MA_forecasts + 2*MA_forecast_se, type = "l", col = 2, lty = 2)#Choosing AR or MA: Exploiting ACF plots
# Find correlation between AR_fit and MA_fit
cor(AR_fit, MA_fit)## [1] 0.8805198
# Find AIC of AR
AIC(AR) #4642.548## [1] 4642.548
# Find AIC of MA
AIC(MA) # 4650.568## [1] 4650.568
# Find BIC of AR
BIC(AR) #4653.728## [1] 4653.728
# Find BIC of MA
BIC(MA) #4661.749## [1] 4661.749
#According to the lowest values of the AIC and BIC, I will go with the AR model.
############################################
#AUTO ARIMA MODEL
############################################
## use auto.arima to choose ARIMA terms
fit.arima <- auto.arima(myseries1) #Residual SD = 6.472248## Warning: The chosen seasonal unit root test encountered an error when testing for the first difference.
## From stl(): series is not periodic or has less than two periods
## 0 seasonal differences will be used. Consider using a different unit root test.
## forecast for next 100 time points
forecast_arima <- forecast(fit.arima, h = 100)
## plot it
plot(forecast_arima)#Model evaluation
print(summary(fit.arima))## Series: myseries1
## ARIMA(5,1,2)
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ma1 ma2
## 0.2977 -0.7274 -0.1919 -0.3678 -0.3828 -0.8115 0.5686
## s.e. 0.1240 0.0670 0.0808 0.0521 0.0809 0.1296 0.1086
##
## sigma^2 estimated as 141620: log likelihood=-2247.01
## AIC=4510.01 AICc=4510.5 BIC=4539.8
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -4.135021 371.3887 280.7149 -16.78163 39.05975 0.5019812
## ACF1
## Training set 0.005920115
checkresiduals(fit.arima)##
## Ljung-Box test
##
## data: Residuals from ARIMA(5,1,2)
## Q* = 100.32, df = 54, p-value = 0.000132
##
## Model df: 7. Total lags used: 61
###########################################
#Fit ETS method
###########################################
fit_ets <- ets(myseries1) # Residual SD = 515.6988## Warning in ets(myseries1): I can't handle data with frequency greater than 24.
## Seasonality will be ignored. Try stlf() if you need seasonal forecasts.
print(summary(fit_ets))## ETS(M,Ad,N)
##
## Call:
## ets(y = myseries1)
##
## Smoothing parameters:
## alpha = 0.8683
## beta = 1e-04
## phi = 0.98
##
## Initial states:
## l = 1352.1537
## b = 181.3039
##
## sigma: 0.5145
##
## AIC AICc BIC
## 5539.413 5539.693 5561.774
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -32.73531 515.6362 388.3124 -21.53877 47.69954 0.6943896
## ACF1
## Training set -0.001126532
checkresiduals(fit_ets)##
## Ljung-Box test
##
## data: Residuals from ETS(M,Ad,N)
## Q* = 479.4, df = 56, p-value < 2.2e-16
##
## Model df: 5. Total lags used: 61
############################################
# Facebook Prophet
############################################
# Adjusting the data
sales.data <- select(sales.data, c(1,6))
names(sales.data) <- c('ds', 'y')
# Making a Forecast
fb.ts.s <- prophet(sales.data, yearly.seasonality=FALSE ,daily.seasonality=FALSE)
future.s <- make_future_dataframe(fb.ts.s , periods=365)
tail(future.s)## ds
## 667 2022-10-24
## 668 2022-10-25
## 669 2022-10-26
## 670 2022-10-27
## 671 2022-10-28
## 672 2022-10-29
# Forecasting
forecast.s <- predict(fb.ts.s, future.s)
tail(forecast.s[c('ds', 'yhat', 'yhat_lower', 'yhat_upper')])## ds yhat yhat_lower yhat_upper
## 667 2022-10-24 1676.799 861.9544 2403.852
## 668 2022-10-25 1719.657 934.6177 2586.065
## 669 2022-10-26 1805.081 991.6498 2599.370
## 670 2022-10-27 2159.184 1278.4996 2925.468
## 671 2022-10-28 2512.561 1682.1670 3297.642
## 672 2022-10-29 1878.252 1023.7380 2699.135
plot(fb.ts.s, forecast.s)prophet_plot_components(fb.ts.s, forecast.s)#Downloading the data
visit.dat <- read.csv("~/Documents/GitHub/DS_Capstone_Cafe_Analysis/Hourlydata.csv")VI.plot <- ggplot(data=visit.dat, aes(x=ValueIn)) +
geom_histogram(fill="black",bins = 30) +
xlab("The number of visitors") +
NULL
summary(visit.dat$ValueIn)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 0.000 8.643 12.000 107.000
valueout.plot <- ggplot(data=visit.dat[!is.na(visit.dat$ValueOut),], aes(x=ValueOut)) +
geom_histogram(fill="black",bins = 30) +
xlab("number of visitors leaving the cafe")+
NULL
summary(visit.dat$ValueOut)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 0.000 8.307 11.000 122.000
OT.plot <- ggplot(data=visit.dat, aes(x=OutsideTraffic)) +
geom_histogram(fill="black",bins = 30) +
xlab(" The number of people outside the cafe ") +
NULL
summary(visit.dat$OutsideTraffic)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 0.0 10.0 295.9 82.0 6562.0
TI.plot <- ggplot(data=visit.dat, aes(x=TurnInRate)) +
geom_histogram(fill="black",bins = 30) +
xlab("The rate of visitors turn in") +
NULL
summary(visit.dat$TurnInRate)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.00 0.00 23.27 35.71 100.00
day.plot <- ggplot(visit.dat, aes(fill = factor(Day), mean(ValueIn))) +
geom_bar(position = "dodge") +
xlab("The days")
NULL## NULL
table(visit.dat$Day)##
## Friday Monday Saturday Sunday Thursday Tuesday Wednesday
## 1153 1152 1152 1152 1176 1152 1152
ggarrange(VI.plot, valueout.plot, OT.plot, TI.plot, day.plot + rremove("x.text"), ncol = 3, nrow = 2)I decided to remove the variable Value out since it doesn’t add any value to the number of visitors.
# Removing the ValueOut variable
visit.dat <- select(visit.dat, -c(4))set.seed(123) # for reproducibility
visit.split <- initial_split(visit.dat, prop = 0.7)
visit.train <- training(visit.split)
visit.test <- testing(visit.split)# create resampling procedure
set.seed(13)
kfold <- vfold_cv(visit.train, v = 5)
# Step 1: create ridge model object
dt_mod <- decision_tree(mode = "regression") %>% set_engine("rpart")
# Step 2: create model recipe
model_recipe <- recipe(
ValueIn ~.,
data = visit.train
)
# Step 3: fit model workflow
dt_fit <- workflow() %>%
add_recipe(model_recipe) %>%
add_model(dt_mod) %>%
fit(data = visit.train)
# Step 4: results
dt_fit## ══ Workflow [trained] ══════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: decision_tree()
##
## ── Preprocessor ────────────────────────────────────────────────────────────────
## 0 Recipe Steps
##
## ── Model ───────────────────────────────────────────────────────────────────────
## n= 5662
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 5662 1.191322e+06 8.410456000
## 2) Time=0:00,1:00,10:00,11:00,12:00,13:00,14:00,15:00,16:00,2:00,3:00,4:00,5:00,6:00,7:00,8:00,9:00 4062 1.231171e+05 1.985229000
## 4) TurnInRate< 0.175 2838 4.996829e+00 0.001057082 *
## 5) TurnInRate>=0.175 1224 8.603299e+04 6.585784000
## 10) Date=2020-11-26,2020-11-29,2020-12-03,2020-12-05,2020-12-06,2020-12-08,2020-12-14,2020-12-15,2020-12-16,2020-12-21,2020-12-23,2020-12-29,2021-01-03,2021-01-05,2021-01-10,2021-01-12,2021-01-13,2021-01-19,2021-01-21,2021-01-24,2021-01-25,2021-01-31,2021-02-02,2021-02-03,2021-02-04,2021-02-05,2021-02-06,2021-02-07,2021-02-08,2021-02-10,2021-02-12,2021-02-13,2021-02-14,2021-02-15,2021-02-16,2021-02-17,2021-02-18,2021-02-19,2021-02-20,2021-02-21,2021-02-22,2021-02-23,2021-02-24,2021-02-25,2021-02-26,2021-02-27,2021-02-28,2021-03-01,2021-03-02,2021-03-03,2021-03-04,2021-03-05,2021-03-06,2021-03-07,2021-03-08,2021-03-09,2021-03-10,2021-03-11,2021-03-12,2021-03-13,2021-03-14,2021-03-15,2021-03-16,2021-03-17,2021-03-18,2021-03-19,2021-03-20,2021-03-21,2021-03-22,2021-03-23,2021-03-24,2021-03-25,2021-03-26,2021-03-27,2021-03-28,2021-03-29,2021-03-30,2021-03-31,2021-04-01,2021-04-02,2021-04-03,2021-04-04,2021-04-05,2021-04-06,2021-04-07,2021-04-08,2021-04-09,2021-04-11,2021-04-12,2021-04-22,2021-04-23,2021-04-25,2021-04-26,2021-04-27,2021-04-28,2021-04-29,2021-05-02,2021-05-03,2021-05-05,2021-05-06,2021-05-07,2021-05-08,2021-05-09,2021-05-10,2021-05-11,2021-05-13,2021-05-14,2021-05-15,2021-05-16,2021-05-17,2021-05-18,2021-05-19,2021-05-20,2021-05-21,2021-05-22,2021-05-23,2021-05-24,2021-05-25,2021-05-26,2021-05-27,2021-05-28,2021-05-29,2021-05-30,2021-05-31,2021-06-01,2021-06-02,2021-06-03,2021-06-04,2021-06-05,2021-06-06,2021-06-07,2021-06-08,2021-06-09,2021-06-10,2021-06-11,2021-06-12,2021-06-13,2021-06-14,2021-06-15,2021-06-16,2021-06-17,2021-06-18,2021-06-19,2021-06-20,2021-06-21,2021-06-22,2021-06-23,2021-06-24,2021-06-25,2021-06-26,2021-06-27,2021-06-28,2021-06-29,2021-06-30,2021-07-01,2021-07-02,2021-07-04,2021-07-05,2021-07-06,2021-07-07,2021-07-08,2021-07-09,2021-07-10,2021-07-11,2021-07-12,2021-07-13,2021-07-14,2021-07-15,2021-07-16,2021-07-17,2021-07-18,2021-07-19,2021-07-20,2021-07-21,2021-07-22,2021-07-23,2021-07-24,2021-07-25,2021-07-26,2021-07-27,2021-07-28,2021-07-29,2021-07-30,2021-07-31,2021-08-01,2021-08-02,2021-08-03,2021-08-04,2021-08-05,2021-08-06,2021-08-07,2021-08-08,2021-08-09,2021-08-10,2021-08-11,2021-08-15,2021-08-16,2021-08-17,2021-08-18,2021-08-19,2021-08-20,2021-08-21,2021-08-22,2021-08-23,2021-08-24,2021-08-25,2021-08-26,2021-08-28,2021-08-29,2021-08-30,2021-08-31,2021-09-01,2021-09-02,2021-09-03,2021-09-05,2021-09-06,2021-09-07,2021-09-08,2021-09-09,2021-09-10,2021-09-11,2021-09-12,2021-09-13,2021-09-14,2021-09-15,2021-09-16,2021-09-18,2021-09-20,2021-09-21,2021-09-22,2021-09-23,2021-09-24,2021-09-25,2021-09-26,2021-09-27,2021-09-28,2021-09-29,2021-09-30,2021-10-01,2021-10-02,2021-10-03,2021-10-05,2021-10-06,2021-10-07,2021-10-08,2021-10-09,2021-10-10,2021-10-11,2021-10-12,2021-10-13,2021-10-14,2021-10-15,2021-10-17,2021-10-18,2021-10-20,2021-10-23,2021-10-24,2021-10-28 1136 2.486614e+04 5.040493000 *
## 11) Date=2020-11-27,2020-11-28,2020-11-30,2020-12-01,2020-12-02,2020-12-04,2020-12-07,2020-12-09,2020-12-10,2020-12-12,2020-12-17,2020-12-18,2020-12-19,2020-12-22,2020-12-24,2020-12-25,2020-12-27,2020-12-28,2020-12-31,2021-01-01,2021-01-02,2021-01-04,2021-01-06,2021-01-07,2021-01-08,2021-01-09,2021-01-11,2021-01-14,2021-01-15,2021-01-16,2021-01-17,2021-01-18,2021-01-20,2021-01-22,2021-01-23,2021-01-26,2021-01-27,2021-01-28,2021-01-29,2021-02-01,2021-04-24,2021-05-01,2021-05-04,2021-07-03,2021-08-14,2021-09-04,2021-10-16,2021-10-19,2021-10-21,2021-10-22,2021-10-25,2021-10-27 88 2.343590e+04 26.534090000 *
## 3) Time=17:00,18:00,19:00,20:00,21:00,22:00,23:00 1600 4.747788e+05 24.722500000
## 6) Date=2020-11-26,2020-11-29,2020-11-30,2020-12-12,2020-12-13,2020-12-14,2020-12-15,2020-12-19,2020-12-20,2020-12-21,2020-12-22,2020-12-23,2020-12-27,2020-12-28,2020-12-29,2020-12-30,2021-01-03,2021-01-04,2021-01-11,2021-01-17,2021-01-21,2021-01-24,2021-01-25,2021-01-26,2021-01-27,2021-02-03,2021-02-05,2021-02-06,2021-02-07,2021-02-08,2021-02-09,2021-02-10,2021-02-11,2021-02-12,2021-02-13,2021-02-14,2021-02-15,2021-02-16,2021-02-17,2021-02-18,2021-02-19,2021-02-20,2021-02-21,2021-02-22,2021-02-23,2021-02-24,2021-02-25,2021-02-26,2021-02-27,2021-02-28,2021-03-01,2021-03-02,2021-03-03,2021-03-04,2021-03-05,2021-03-06,2021-03-07,2021-03-09,2021-03-11,2021-03-12,2021-03-13,2021-03-14,2021-03-15,2021-03-16,2021-03-17,2021-03-18,2021-03-20,2021-03-21,2021-03-24,2021-03-27,2021-03-28,2021-03-29,2021-03-31,2021-04-03,2021-04-04,2021-04-05,2021-04-07,2021-04-10,2021-04-11,2021-04-12,2021-04-13,2021-04-14,2021-04-15,2021-04-16,2021-04-17,2021-04-18,2021-04-19,2021-04-20,2021-04-21,2021-04-22,2021-04-23,2021-04-24,2021-04-25,2021-04-26,2021-04-27,2021-04-28,2021-04-29,2021-04-30,2021-05-01,2021-05-02,2021-05-03,2021-05-04,2021-05-05,2021-05-06,2021-05-07,2021-05-08,2021-05-09,2021-05-10,2021-05-11,2021-05-12,2021-05-13,2021-05-17,2021-05-19,2021-05-22,2021-05-23,2021-05-24,2021-05-25,2021-05-28,2021-05-30,2021-05-31,2021-06-01,2021-06-02,2021-06-05,2021-06-06,2021-06-07,2021-06-08,2021-06-12,2021-06-13,2021-06-15,2021-06-16,2021-06-19,2021-06-20,2021-06-22,2021-06-23,2021-06-26,2021-06-27,2021-06-28,2021-06-29,2021-07-04,2021-07-05,2021-07-06,2021-07-07,2021-07-10,2021-07-11,2021-07-12,2021-07-13,2021-07-14,2021-07-15,2021-07-16,2021-07-17,2021-07-18,2021-07-19,2021-07-20,2021-07-21,2021-07-24,2021-07-26,2021-07-27,2021-07-31,2021-08-01,2021-08-02,2021-08-03,2021-08-06,2021-08-07,2021-08-08,2021-08-09,2021-08-10,2021-08-15,2021-08-16,2021-08-17,2021-08-18,2021-08-22,2021-08-23,2021-08-24,2021-08-25,2021-08-28,2021-08-30,2021-08-31,2021-09-05,2021-09-06,2021-09-07,2021-09-08,2021-09-11,2021-09-12,2021-09-13,2021-09-14,2021-09-15,2021-09-18,2021-09-19,2021-09-20,2021-09-21,2021-09-26,2021-09-27,2021-09-29,2021-10-02,2021-10-03,2021-10-04,2021-10-05,2021-10-06,2021-10-07,2021-10-10,2021-10-11,2021-10-12,2021-10-13,2021-10-17,2021-10-18,2021-10-19,2021-10-20,2021-10-23,2021-10-24,2021-10-25,2021-10-26 979 1.062612e+05 16.598570000
## 12) Date=2020-12-15,2021-02-05,2021-02-06,2021-02-07,2021-02-08,2021-02-09,2021-02-10,2021-02-11,2021-02-12,2021-02-15,2021-02-16,2021-02-17,2021-02-18,2021-02-19,2021-02-20,2021-02-21,2021-02-22,2021-02-23,2021-02-24,2021-02-25,2021-02-26,2021-02-27,2021-02-28,2021-03-01,2021-03-02,2021-03-03,2021-03-04,2021-03-05,2021-03-06,2021-03-13,2021-03-14,2021-04-11,2021-04-12,2021-04-13,2021-04-14,2021-04-15,2021-04-17,2021-04-18,2021-04-19,2021-04-21,2021-04-23,2021-04-25,2021-04-27,2021-04-28,2021-04-30,2021-05-01,2021-05-02,2021-05-03,2021-05-04,2021-05-05,2021-05-06,2021-05-07,2021-05-08,2021-05-09,2021-05-10,2021-05-11,2021-05-12,2021-05-24,2021-06-02,2021-06-06,2021-07-04,2021-07-07,2021-07-11,2021-07-13,2021-07-18,2021-07-19,2021-07-21,2021-07-24,2021-08-08,2021-08-17,2021-08-30,2021-08-31,2021-09-20,2021-09-26,2021-10-03,2021-10-07,2021-10-10,2021-10-19 368 2.056854e+04 9.339674000 *
## 13) Date=2020-11-26,2020-11-29,2020-11-30,2020-12-12,2020-12-13,2020-12-14,2020-12-19,2020-12-20,2020-12-21,2020-12-22,2020-12-23,2020-12-27,2020-12-28,2020-12-29,2020-12-30,2021-01-03,2021-01-04,2021-01-11,2021-01-17,2021-01-21,2021-01-24,2021-01-25,2021-01-26,2021-01-27,2021-02-03,2021-02-13,2021-02-14,2021-03-07,2021-03-09,2021-03-11,2021-03-12,2021-03-15,2021-03-16,2021-03-17,2021-03-18,2021-03-20,2021-03-21,2021-03-24,2021-03-27,2021-03-28,2021-03-29,2021-03-31,2021-04-03,2021-04-04,2021-04-05,2021-04-07,2021-04-10,2021-04-16,2021-04-20,2021-04-22,2021-04-24,2021-04-26,2021-04-29,2021-05-13,2021-05-17,2021-05-19,2021-05-22,2021-05-23,2021-05-25,2021-05-28,2021-05-30,2021-05-31,2021-06-01,2021-06-05,2021-06-07,2021-06-08,2021-06-12,2021-06-13,2021-06-15,2021-06-16,2021-06-19,2021-06-20,2021-06-22,2021-06-23,2021-06-26,2021-06-27,2021-06-28,2021-06-29,2021-07-05,2021-07-06,2021-07-10,2021-07-12,2021-07-14,2021-07-15,2021-07-16,2021-07-17,2021-07-20,2021-07-26,2021-07-27,2021-07-31,2021-08-01,2021-08-02,2021-08-03,2021-08-06,2021-08-07,2021-08-09,2021-08-10,2021-08-15,2021-08-16,2021-08-18,2021-08-22,2021-08-23,2021-08-24,2021-08-25,2021-08-28,2021-09-05,2021-09-06,2021-09-07,2021-09-08,2021-09-11,2021-09-12,2021-09-13,2021-09-14,2021-09-15,2021-09-18,2021-09-19,2021-09-21,2021-09-27,2021-09-29,2021-10-02,2021-10-04,2021-10-05,2021-10-06,2021-10-11,2021-10-12,2021-10-13,2021-10-17,2021-10-18,2021-10-20,2021-10-23,2021-10-24,2021-10-25,2021-10-26 611 5.462347e+04 20.970540000 *
## 7) Date=2020-11-27,2020-11-28,2020-12-01,2020-12-02,2020-12-03,2020-12-04,2020-12-05,2020-12-06,2020-12-07,2020-12-08,2020-12-09,2020-12-10,2020-12-11,2020-12-16,2020-12-17,2020-12-18,2020-12-24,2020-12-25,2020-12-26,2020-12-31,2021-01-01,2021-01-02,2021-01-05,2021-01-06,2021-01-07,2021-01-08,2021-01-09,2021-01-10,2021-01-12,2021-01-13,2021-01-14,2021-01-15,2021-01-16,2021-01-18,2021-01-19,2021-01-20,2021-01-22,2021-01-23,2021-01-28,2021-01-29,2021-01-30,2021-01-31,2021-02-01,2021-02-02,2021-02-04,2021-03-08,2021-03-10,2021-03-19,2021-03-22,2021-03-23,2021-03-25,2021-03-26,2021-03-30,2021-04-01,2021-04-02,2021-04-06,2021-04-08,2021-04-09,2021-05-14,2021-05-15,2021-05-16,2021-05-18,2021-05-20,2021-05-21,2021-05-26,2021-05-27,2021-05-29,2021-06-03,2021-06-04,2021-06-09,2021-06-10,2021-06-11,2021-06-14,2021-06-17,2021-06-18,2021-06-21,2021-06-24,2021-06-25,2021-06-30,2021-07-01,2021-07-02,2021-07-03,2021-07-08,2021-07-09,2021-07-22,2021-07-23,2021-07-25,2021-07-28,2021-07-29,2021-07-30,2021-08-04,2021-08-05,2021-08-11,2021-08-12,2021-08-13,2021-08-14,2021-08-19,2021-08-20,2021-08-21,2021-08-26,2021-08-27,2021-08-29,2021-09-01,2021-09-02,2021-09-03,2021-09-04,2021-09-09,2021-09-10,2021-09-16,2021-09-17,2021-09-22,2021-09-23,2021-09-24,2021-09-25,2021-09-28,2021-09-30,2021-10-01,2021-10-08,2021-10-09,2021-10-14,2021-10-15,2021-10-16,2021-10-21,2021-10-22,2021-10-27,2021-10-28 621 2.020447e+05 37.529790000
## 14) Time=17:00,18:00,19:00 256 5.171096e+04 28.011720000
## 28) Date=2020-12-06,2020-12-09,2020-12-16,2021-01-05,2021-01-10,2021-01-12,2021-01-13,2021-01-18,2021-01-28,2021-02-01,2021-03-08,2021-03-10,2021-03-23,2021-03-25,2021-03-26,2021-03-30,2021-04-01,2021-04-08,2021-04-09,2021-05-14,2021-05-15,2021-05-16,2021-05-18,2021-05-20,2021-05-21,2021-05-26,2021-05-27,2021-05-29,2021-06-03,2021-06-04,2021-06-11,2021-06-14,2021-06-17,2021-06-21,2021-06-24,2021-06-25,2021-06-30,2021-07-01,2021-07-02,2021-07-03,2021-07-08,2021-07-22,2021-07-23,2021-07-25,2021-07-28,2021-07-29,2021-07-30,2021-08-04,2021-08-05,2021-08-11,2021-08-12,2021-08-19,2021-08-20,2021-08-26,2021-08-29,2021-09-01,2021-09-02,2021-09-09,2021-09-22,2021-09-30,2021-10-01,2021-10-14,2021-10-21,2021-10-27,2021-10-28 131 8.283740e+03 18.587790000 *
## 29) Date=2020-11-27,2020-11-28,2020-12-01,2020-12-02,2020-12-03,2020-12-05,2020-12-07,2020-12-08,2020-12-10,2020-12-11,2020-12-17,2020-12-18,2020-12-24,2020-12-25,2020-12-26,2020-12-31,2021-01-01,2021-01-02,2021-01-06,2021-01-07,2021-01-08,2021-01-09,2021-01-14,2021-01-15,2021-01-19,2021-01-20,2021-01-22,2021-01-23,2021-01-29,2021-01-30,2021-01-31,2021-02-02,2021-02-04,2021-03-19,2021-03-22,2021-04-02,2021-06-10,2021-06-18,2021-07-09,2021-08-13,2021-08-14,2021-08-21,2021-08-27,2021-09-03,2021-09-04,2021-09-10,2021-09-16,2021-09-17,2021-09-23,2021-09-24,2021-09-25,2021-09-28,2021-10-08,2021-10-09,2021-10-15,2021-10-16,2021-10-22 125 1.960043e+04 37.888000000 *
## 15) Time=20:00,21:00,22:00,23:00 365 1.108756e+05 44.205480000
## 30) Date=2020-11-27,2020-11-28,2020-12-01,2020-12-02,2020-12-03,2020-12-04,2020-12-05,2020-12-06,2020-12-07,2020-12-09,2020-12-16,2020-12-17,2020-12-18,2020-12-24,2020-12-25,2020-12-26,2020-12-31,2021-01-02,2021-01-09,2021-01-10,2021-01-12,2021-01-16,2021-01-18,2021-01-19,2021-01-22,2021-01-23,2021-01-30,2021-01-31,2021-02-01,2021-02-02,2021-02-04,2021-03-10,2021-03-19,2021-03-22,2021-03-25,2021-03-26,2021-03-30,2021-04-01,2021-04-06,2021-05-14,2021-05-16,2021-05-18,2021-05-20,2021-05-29,2021-06-03,2021-06-09,2021-06-14,2021-06-18,2021-06-21,2021-06-30,2021-07-02,2021-07-03,2021-07-08,2021-07-25,2021-07-28,2021-08-04,2021-08-05,2021-08-11,2021-08-14,2021-08-21,2021-08-27,2021-08-29,2021-09-01,2021-09-04,2021-09-25,2021-09-28,2021-10-09,2021-10-16,2021-10-22,2021-10-27 211 3.011762e+04 35.232230000 *
## 31) Date=2020-12-08,2020-12-10,2020-12-11,2021-01-01,2021-01-05,2021-01-06,2021-01-07,2021-01-08,2021-01-13,2021-01-14,2021-01-15,2021-01-20,2021-01-28,2021-01-29,2021-03-08,2021-03-23,2021-04-02,2021-04-08,2021-04-09,2021-05-15,2021-05-21,2021-05-26,2021-05-27,2021-06-04,2021-06-10,2021-06-11,2021-06-17,2021-06-24,2021-06-25,2021-07-01,2021-07-09,2021-07-22,2021-07-23,2021-07-29,2021-07-30,2021-08-12,2021-08-13,2021-08-19,2021-08-20,2021-08-26,2021-09-02,2021-09-03,2021-09-09,2021-09-10,2021-09-16,2021-09-17,2021-09-22,2021-09-23,2021-09-24,2021-09-30,2021-10-01,2021-10-08,2021-10-14,2021-10-15,2021-10-21,2021-10-28 154 4.049050e+04 56.500000000 *
# train model
results <- fit_resamples(dt_mod, model_recipe, kfold)
# model results
collect_metrics(results)## # A tibble: 2 × 6
## .metric .estimator mean n std_err .config
## <chr> <chr> <dbl> <int> <dbl> <chr>
## 1 rmse standard 8.21 5 0.210 Preprocessor1_Model1
## 2 rsq standard 0.683 5 0.0135 Preprocessor1_Model1
dt_fit %>%
pull_workflow_fit() %>%
vip(20)## Warning: `pull_workflow_fit()` was deprecated in workflows 0.2.3.
## Please use `extract_fit_parsnip()` instead.
Time and Date are the most important variables.
Time_vig <- plot_ly(visit.dat, y = visit.dat$ValueIn, type = 'bar', color = ~Time) %>%
layout( barmode = 'stack')
Time_vig## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
ggplot(visit.dat, aes(fill = factor(Day), max(ValueIn))) +
geom_bar(position = "dodge") +
NULL###################################
# Time Series For visitor count
###################################
# Taking out the observations from 2020-11-26 until 2020-11-30 since there wont be monthly affect for this month
visit.dat <- slice(visit.dat,-c(1:120))
# Using aggregate() function to prepare dataset for plotting and time series analysis
visit.dat <- aggregate(ValueIn ~ Day + Time , visit.dat, mean)
## Create a daily Date object - helps my work on dates
inds <- seq(as.Date("2020-12-01"), as.Date("2021-10-29"), by = "day")
## Create a time series object
set.seed(25)
v.series <- ts(visit.dat[,3],
start = c(2020, as.numeric(format(inds[1], "%j"))))
###########################################
# Preliminary Analysis
###########################################
# Time Plot
autoplot(v.series) + ggtitle("Time Plot: Daily visitors") autoplot(v.series) +
ggtitle("Daily visitors") +
xlab("(visiting month)") + ylab("(visitors Records)") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5))# Data has some trend. Investigate transformations.
#Take the first difference of the data to remove the trend.
DY <- diff(v.series)
# Time plot of the differenced data
autoplot(DY) +
ggtitle("Time Plot: Change in Daily visitors") +
ylab("number of visitors")# Series appears stationary, use to investigate seasonality.
# #not working
# ggseasonplot(DY) +
# ggtitle("Seasonal Plot: Change in Daily number of visitors")
# subseries plot
# not working
# ggsubseriesplot(DY) #Data are not seasonal
############################################
# ARMA MODEL
############################################
#This will plot the time series
ts.plot(v.series, xlab="month", ylab="Daily visitors", main="Daily visitors")
# This will fit in a line
abline(reg=lm(v.series~time(v.series)))#Auto correlation
acf(v.series) #it describes how well the present value of the series is related with its past values############################################
#Fitting the AR Model to the time series
############################################
AR <- arima(v.series, order = c(1,0,0))
print(AR)##
## Call:
## arima(x = v.series, order = c(1, 0, 0))
##
## Coefficients:
## ar1 intercept
## 0.7592 8.3312
## s.e. 0.0495 2.4213
##
## sigma^2 estimated as 59.21: log likelihood = -581.62, aic = 1169.24
#plotting the series along with the fitted values
ts.plot(v.series)
AR_fit <- v.series - residuals(AR)
points(AR_fit, type = "l", col = 2, lty = 2)#Forcasting using AR model
#Using predict() to make a 1-step forecast
predict_AR <- predict(AR)
#Obtaining the 1-step forecast using $pred[1]
predict_AR$pred[1]## [1] 2.607299
#ALternatively Using predict to make 1-step through 10-step forecasts
predict(AR, n.ahead = 10)## $pred
## Time Series:
## Start = 2523
## End = 2532
## Frequency = 1
## [1] 2.607299 3.985703 5.032168 5.826630 6.429775 6.887674 7.235305 7.499222
## [9] 7.699583 7.851695
##
## $se
## Time Series:
## Start = 2523
## End = 2532
## Frequency = 1
## [1] 7.694655 9.660896 10.630213 11.150680 11.439907 11.603332 11.696487
## [8] 11.749843 11.780485 11.798110
#plotting the series plus the forecast and 95% prediction intervals
ts.plot(v.series)
AR_forecast <- predict(AR, n.ahead = 10)$pred
AR_forecast_se <- predict(AR, n.ahead = 10)$se
points(AR_forecast, type = "l", col = 2)
points(AR_forecast - 2*AR_forecast_se, type = "l", col = 2, lty = 2)
points(AR_forecast + 2*AR_forecast_se, type = "l", col = 2, lty = 2)############################################
#Fit the MA model
############################################
#Fitting the MA model to AirPassengers
MA <- arima(v.series, order = c(0,0,1))
print(MA)##
## Call:
## arima(x = v.series, order = c(0, 0, 1))
##
## Coefficients:
## ma1 intercept
## 0.4743 8.5748
## s.e. 0.0521 1.1082
##
## sigma^2 estimated as 95.29: log likelihood = -621.29, aic = 1248.58
#plotting the series along with the MA fitted values
ts.plot(v.series)
MA_fit <- v.series - resid(MA)
points(MA_fit, type = "l", col = 2, lty = 2)#Forcasting using MA model
#Making a 1-step forecast based on MA
predict_MA <- predict(MA)
#Obtaining the 1-step forecast using $pred[1]
predict_MA$pred[1]## [1] 5.978835
#Alternately Making a 1-step through 10-step forecast based on MA
predict(MA,n.ahead=10)## $pred
## Time Series:
## Start = 2523
## End = 2532
## Frequency = 1
## [1] 5.978835 8.574797 8.574797 8.574797 8.574797 8.574797 8.574797 8.574797
## [9] 8.574797 8.574797
##
## $se
## Time Series:
## Start = 2523
## End = 2532
## Frequency = 1
## [1] 9.761522 10.803969 10.803969 10.803969 10.803969 10.803969 10.803969
## [8] 10.803969 10.803969 10.803969
#Plotting the AIrPAssenger series plus the forecast and 95% prediction intervals
ts.plot(v.series)
MA_forecasts <- predict(MA, n.ahead = 10)$pred
MA_forecast_se <- predict(MA, n.ahead = 10)$se
points(MA_forecasts, type = "l", col = 2)
points(MA_forecasts - 2*MA_forecast_se, type = "l", col = 2, lty = 2)
points(MA_forecasts + 2*MA_forecast_se, type = "l", col = 2, lty = 2)#Choosing AR or MA: Exploiting ACF plots
# Find correlation between AR_fit and MA_fit
cor(AR_fit, MA_fit)## [1] 0.9269574
# Find AIC of AR
AIC(AR) #1169.239## [1] 1169.239
# Find AIC of MA
AIC(MA) # 1248.577## [1] 1248.577
# Find BIC of AR
BIC(AR) #1178.611## [1] 1178.611
# Find BIC of MA
BIC(MA) #1257.949## [1] 1257.949
#According to the lowest values of the AIC and BIC, I will go with the AR model.
############################################
#An ARIMA MODEL
############################################
## use auto.arima to choose ARIMA terms
fit.arima <- auto.arima(v.series,d=1,D=1, stepwise = FALSE, approximation = FALSE, trace = TRUE) # d=1 (the first difference), trace prints all the models, Residual SD = 6.472248##
## ARIMA(0,1,0) : 1179.261
## ARIMA(0,1,0) with drift : 1181.309
## ARIMA(0,1,1) : 1122.093
## ARIMA(0,1,1) with drift : 1124.167
## ARIMA(0,1,2) : 1121.729
## ARIMA(0,1,2) with drift : 1123.828
## ARIMA(0,1,3) : 1123.683
## ARIMA(0,1,3) with drift : 1125.808
## ARIMA(0,1,4) : 1119.371
## ARIMA(0,1,4) with drift : 1121.523
## ARIMA(0,1,5) : 1106.666
## ARIMA(0,1,5) with drift : 1108.845
## ARIMA(1,1,0) : 1127.438
## ARIMA(1,1,0) with drift : 1129.512
## ARIMA(1,1,1) : 1121.405
## ARIMA(1,1,1) with drift : 1123.505
## ARIMA(1,1,2) : 1122.731
## ARIMA(1,1,2) with drift : 1124.857
## ARIMA(1,1,3) : 1124.714
## ARIMA(1,1,3) with drift : 1126.867
## ARIMA(1,1,4) : 1107.301
## ARIMA(1,1,4) with drift : 1109.48
## ARIMA(2,1,0) : 1125.1
## ARIMA(2,1,0) with drift : 1127.199
## ARIMA(2,1,1) : 1123.235
## ARIMA(2,1,1) with drift : 1125.361
## ARIMA(2,1,2) : 1124.805
## ARIMA(2,1,2) with drift : 1126.957
## ARIMA(2,1,3) : 1126.901
## ARIMA(2,1,3) with drift : 1129.081
## ARIMA(3,1,0) : 1121.027
## ARIMA(3,1,0) with drift : 1123.153
## ARIMA(3,1,1) : 1123.139
## ARIMA(3,1,1) with drift : 1125.291
## ARIMA(3,1,2) : 1107.077
## ARIMA(3,1,2) with drift : 1109.257
## ARIMA(4,1,0) : 1123.125
## ARIMA(4,1,0) with drift : 1125.277
## ARIMA(4,1,1) : 1125.305
## ARIMA(4,1,1) with drift : 1127.484
## ARIMA(5,1,0) : 1123.481
## ARIMA(5,1,0) with drift : 1125.66
##
##
##
## Best model: ARIMA(0,1,5)
#Model evaluation
print(summary(fit.arima))## Series: v.series
## ARIMA(0,1,5)
##
## Coefficients:
## ma1 ma2 ma3 ma4 ma5
## -0.5187 0.0246 -0.2964 0.5532 -0.2723
## s.e. 0.0752 0.0719 0.0649 0.0793 0.0643
##
## sigma^2 estimated as 41.89: log likelihood=-547.07
## AIC=1106.14 AICc=1106.67 BIC=1124.85
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.004934802 6.355837 3.185637 NaN Inf 0.7677304 -0.03316768
checkresiduals(fit.arima)##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,5)
## Q* = 30.16, df = 5, p-value = 1.372e-05
##
## Model df: 5. Total lags used: 10
## plot it
plot(forecast_arima)## forecast for next 60 time points
forecast_arima <- forecast(fit.arima, h = 100)
###########################################
#Fit ETS method
###########################################
fit_ets <- ets(v.series) # Residual SD = 6.849643
print(summary(fit_ets))## ETS(A,N,N)
##
## Call:
## ets(y = v.series)
##
## Smoothing parameters:
## alpha = 0.4156
##
## Initial states:
## l = 1.4505
##
## sigma: 6.8908
##
## AIC AICc BIC
## 1513.356 1513.502 1522.728
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.007053437 6.849643 3.48906 -Inf Inf 0.8408546 -0.07499169
checkresiduals(fit_ets)##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,N)
## Q* = 54.674, df = 8, p-value = 5.109e-09
##
## Model df: 2. Total lags used: 10
############################################
# Facebook Prophet
#not working well on this data
############################################
# Adjusting the data
visit.dat <- read.csv("~/Documents/GitHub/DS_Capstone_Cafe_Analysis/Hourlydata.csv")
# Taking out the observations from 2020-11-26 until 2020-11-30 since there wont be monthly affect for this month
visit.dat <- slice(visit.dat,-c(1:120))
visit.dat <- select(visit.dat, c(1,3))
names(visit.dat) <- c('ds', 'y')
# Making a Forecast
fb.ts <- prophet(visit.dat, yearly.seasonality=FALSE ,daily.seasonality=FALSE)
future <- make_future_dataframe(fb.ts , periods=365)
tail(future)## ds
## 693 2022-10-24
## 694 2022-10-25
## 695 2022-10-26
## 696 2022-10-27
## 697 2022-10-28
## 698 2022-10-29
# Forecasting
forecast <- predict(fb.ts, future)
tail(forecast[c('ds', 'yhat', 'yhat_lower', 'yhat_upper')])## ds yhat yhat_lower yhat_upper
## 693 2022-10-24 8.268133 -15.222656 32.10870
## 694 2022-10-25 8.625258 -14.634970 31.54612
## 695 2022-10-26 9.360516 -13.319796 32.98290
## 696 2022-10-27 12.472258 -10.028295 36.34948
## 697 2022-10-28 13.801397 -8.205279 38.34382
## 698 2022-10-29 9.756179 -14.117433 32.67423
plot(fb.ts, forecast)prophet_plot_components(fb.ts, forecast)To increase the number of visitors and thus profits, I suggest focusing on weekends as they tend to increase the number of people. Also, removing unprofitable products from then and increasing the number of profitable products may help.
Deploying the model is one of what I plan to do in the future to feed it with the new data after completing one year from the opening date.
The problem that I faced is that as I said, there is no seasonality in the data. I think that due to covid-19 lockdown and the limited number of customers that are allowed to enter the cafe. The Limitation was in the data since it is for a period that is less than a year.